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(57) Conventionally, a web site stores Internet data 
indicating file access status for the files that have been 
accessed in response to requests from web browsers. 
Unfortunately, the Internet data are kept as a set of sep- 
arate and non-correlated data records that are chrono- 
logically arranged according to the times at which the 
requests have been received and processed. Conse- 
quently, the Internet data are not arranged meaningful 
to management and business operation. The present in- 
vention correlates web page files (HTML, SHTML, 
DHTML, or CGI files) with other type files (GIF, JPEG, 
or AVI files), so that the Internet data can be presented 
in a format meaningful to management and business op- 
eration. 
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Description 

[0001] The present invention relates generally to a 
method and apparatus for presenting Internet data in a 
tormat meaningful to management and business oper- 
ation. 

[0002] With the development in information technolo- 
gy and networking infrastructure, more and more busi- 
ness transactions are being conducted electronically 
over the Internet. Using the Internet to conduct business 
transactions are now getting so popular that it is current- 
ly well known as electronic commerce (or Internet com- 
merce) by the industries and public. It is fair to predict 
that electronic commerce is having an enormous impact 
on the way businesses will be conducted and managed 
in the future. Thus, there is a great interest in studying 
and understanding consumers' behavior and decision 
process in electronic commerce environment. 
[0003] Traditionally, business transactions have been 
conducted at business premises, and there exist meth- 
ods and techniques to study consumer's behaviour and 
decision process for traditional business environment. 
For example, a retailer can display its goods in store 
shelves arranged in accordance with the changes of the 
four seasons. By observing consumers' reactions to the 
arrangement, the retailer can adjust the layout of the 
shelves to facilitate sales of its goods. 
[0004] In the electronic commerce environment, a re- 
tailer or service provider typically displays information 
about its goods or services in a web site (which includes 
at least one server) via the Internet. Specifically, the 
server for the web site stores the information in a set 
ofweb page files, such as HTML (Hypertext Markup Lan- 
guage) files. In addition to containing text content, an 
HTML file may also contain links to other type files, such 
as graphic or audio files, for displaying pictures and 
icons and playing audio message. An HTML file may 
also contain links to other web page files. The other type 
files can be also stored on the server. By using his/her 
web browser, a customer (or a potential customer) can 
remotely navigate through the web site, gaining the in- 
formation about the goods and services, or ordering se- 
lected goods or services. Unfortunately, unlike in tradi- 
tional business environment, there is no reliable method 
in electronic commerce environment at the present time 
to measure the effectiveness of the layout of a web site. 
This is due to the difficulties in observing consumers' 
behaviour and analyzing consumers' decision process 
over the Internet. 

[0005] Historically, the Internet was designed as an 
open structure in which the main purpose is to exchange 
information freely without restriction. To obtain a web 
page file (such as an HTML file) from a web site, a web 
browser first sends a request to the server for that web 
site. Upon receiving the request, the server retrieves the 
HTML file requested and send it to the web browser. Up- 
on receiving the HTML file, the web browser displays 
the HTML file as a web page. If the HTML file also con- 



tains links to other type files (such as graphic or audio 
files), the browser subsequently sends requests to the 
server for these files. Upon receiving the requests, the 
server retrieves these files and sends them to the web 

s browser. Upon receiving theses files, the browser dis- 
plays pictures and icons on the web page, or executes 
an application to play audio files embedded in the web 
page. If Ihe HTML file also contains a link to another 
HTML file, upon clicking (or activating) the link, the 

10 browser sends a further request to the server for the 
HTMLfile. Upon receiving the further request, the server 
retrieves the HTML files and sends it to the web browser. 
It should be noticed that browsers interact with web sites 
in a stateless fashion. On the Internet, a particular web 

15 site can be accessed by thousands of browsers in a ran- 
dom fashion. While a browser is sending a sequence of 
requests to a web site, it does not maintain a constant 
connection to that web site between any two consecu- 
tive requests. To a server, it has no control over the se- 

20 quences of requests; a subsequent request may not 
have any logical relationship with the previous one; a 
sequence of requests may come from different web 
browsers; a request may be generated from a link em- 
bedded in an HTML file. Consequently, it is difficult to 

25 consecutively observe customers' activities and behav- 
ior in electronic commerce environment over the Inter- 
net. 

[0006] Current technology provides mechanisms to 
record access status data (or Internet data) for web 
30 page and other type files while a sequence of requests 
are being received and processed by a server. However, 
the Internet data are kept as a set of separate and non- 
correlated data records that are chronologically ar- 
ranged according to the times at which the requests 
35 were received and processed. Consequently, Internet 
data, without further processing, are not meaningful to 
management and business operation, tn addition since 
Internet data are recorded mainly for the purpose of ad- 
ministrating web sites, they may contain redundant and 
40 erroneous data that have no use to management and 
business operation analysis. When Internet data are fur- 
ther processed by other applications (such as by data 
warehouse applications), these redundant and errone- 
ous data are undesirable because they wastefully occu- 
rs py storage space and may cause errors in reports or 
during analysis. 

[0007] Moreover, Internet data may be generated by 
different types of servers that may use different formats 
to record the Internet data. In another words, Internet 
50 data generated by different types of servers are not 
compatible in format. This causes further problems to 
the utilisation of Internet data. 

[0008] Therefore, there is a need for a method and 
apparatus to present Internet data in a format that is 
55 meaningful to management and business operation. 
[0009] There is another need for a method and appa- 
ratus to present Internet data in a format that can be 
further efficiently utilized. 
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[001 0] There is yet another need for a method and ap- 
paratus to filter Internet data to facilitate further analyz- 
ing process. 

[0011] There is still another need for a method and 
apparatus to combine Internet data from multiple serv- 
ers, possibly from different types of servers, into a co- 
herent format. 

[0012] The present invention meets these needs. 
[001 3] According to a first aspect of the present inven- 
tion, there is provided a method as set out in accompa- 
nying claim 1. 

[0014] According to a second aspect of the present 
invention, there is provided a method as set out in claim 
4. 

[0015] According to a third aspect of the present in- 
vention, there is provided a method as set out in accom- 
panying claim 8. 

[001 6] According to another aspect of the present in- 
vention, there is provided apparatus arranged to con- 
duct a method in accordance with any one of the first 
three aspects of the invention. 

[001 7] The present invention provides a novel method 
and associated apparatus for processing Internet data. 
[0018] Conventionally, a web site is able to store In- 
ternet data indicating file access status for the files that 
have been accessed in response to requests from web 
browsers. Unfortunately, the Internet data are kept as a 
set of separate and non-correlated data records that are 
chronologically arranged according tothe times at which 
the requests have been received and processed. Typi- 
cally, a web page is associated with a web page file, 
which can further embed other type files. However, the 
data records indicating access status for a web page file 
and other type files embedded in the web page file can 
be scattered among multiple data records. Consequent- 
ly, the Internet data are not arranged in a format mean- 
ingful to management and business operation. 
[0019] The present invention converts the Internet da- 
ta into a format meaningful to management and busi- 
ness operation. More specifically, the present invention 
can correlate the data record for a web page file with the 
data records for other type files that are embedded in 
the web page file. 

[0020] In a broad aspect the invention provides a 
method used with a server containing a plurality ofweb 
pages and logs. Each of the web pages contains a web 
page file and one or more other type files. Each of the 
logs contains data indicating access status for the web 
page files and other type files. The method comprises 
the steps of: 

receiving data from the server; 

identifying data for web page files that have been 

accessed; 

identifying data for other type files that are respec- 
tively linked in the web page files; and 
correlating the data for the other files that are re- 
spectively linked in the accessed web page files into 



4 

the data for the web page files. 

[0021] These and other features and advantages 
ofthe present invention will become apparent from the 

5 following description and accompanying drawings, 
which are given by way of example. 
[0022] The purpose and advantages of the present in- 
vention will be apparent to those skilled in the art from 
the following detailed description in conjunction with the 

10 appended drawing, in which: 

figure 1 shows an exemplary network system, in- 
cluding a novel Internet data processing computer, 
in accordance with the present invention; 
15 figure 2 shows an exemplary web page associated 
with a web page file; 

figure 3 shows exemplary data records in server 
logs; 

figure 4 shows a flowchart illustrating the operation 
20 of forming page map shown in figure 1 , in accord- 
ance with the present invention; 
figure 5 shows exemplary data records stored in the 
page map shown in figure 1 , in accordance with the 
present invention; and 
25 figure 6 shows an exemplary computer system that 
is able run utility application shown in figure . 1, in 
accordance with the preset invention. 

[0023] The present invention comprises a novel meth- 
30 od and an associated apparatus for presenting Internet 
data. The following description is presented to enable 
any person skilled in the art to make and use the inven- 
tion, and is provided in the context of a particular appli- 
cation and its requirements. Various modifications to the 
35 preferred embodiment(s) will be readily apparent to 
those skilled in the art, and the principles defined herein 
may be applied to other embodiments and applications 
without departing from the spirit and scope of the inven- 
tion. Thus, the present invention is not intended to be 
40 limited to the embodiment(s) shown, but is to be accord- 
ed with the broadest scope consistent with the principles 
and features disclosed herein. 

[0024] Referring to figure 1 , there is shown an exem- 
plary network system 100 including Internet 105 and In- 
45 tranet (or LAN - Local Area Network) 1 07, in accordance 
with the present invention. 

[0025] Connected to Internet 105 are four servers 
(102.^ 102. 2 , 102. 3 , and 1 02. 4 ) for four respective web 
sites and four user terminals or computers (1 06.! , 1 06. 2 , 

so 106. 3 , and 106. 4 ). Connected to Intranet 106 are four 
servers (102.-,, 102. 2 , 102. 3 , and 102. 4 ) and a data 
processing computer 108. Connected to data process- 
ing computer 108 is a data warehouse 116. 
[0026] It should be noted that, in describing the 

55 present invention, figure 1 shows that only four servers 
and four user computers are connected to Internet 105. 
In reality, Internet 105 connects thousands of servers 
and user computers. 
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[0027] Each ofthe four servers (102 102. 2 , 102. 3 ,or 
102. 4 ) includes a respective web page repository (103.-,, 

103. 2 , 103.3, or 1 03. 4 ) and a respective set of server 
logs (104.-,, 104. 2> 104. 3 , or 104. 4 ). Each ofthe four web 
page repositories (103..,, 103. 2 , 103. 3 , or 103. 4 ) stores 
a plurality of web page files (such as HTML, SHTML, 
DHTML, or CGI files). A web page file may contain links 
to other type files (such as AVI, GIF, JPEG, and PNG 
files). (Note: HTML stands for Hypertext Markup Lan- 
guage, SHTML for Secure HTML, DHTML for Dynamic 
HTML, CGI for Common Gateway Interface, GIF for 
Graphics Interchange Format, JPEG for Joint Photo- 
graphic Expert Group, AVI for Audio Video Interleave, 
and PNG for Portable Network Graphic). The other type 
files are also stored in one of the four servers. Each of 
the four set of server logs (104.-,, 104. 2 , 104. 3 , or 104. 4 ) 
contains access status data (or Internet data) indicating 
access status for the files that have been accessed, or 
attempted to be accessed. 

[0028] Each ofthe four user computers (106.-, , 106. 2 , 

106. 3 , or 106. 4 ) runs a respective web browser (108.-,, 
108. 2 , 108, 3 , or IO8.4), each of which is able to obtain 
files from any one of the four servers via Internet 105, 
and displays these files in a web page format. To obtain 
a web page file from a server, a web browser sends an 
Get request to that server. A Get request contains the 
IP address identifying the user computer on which the 
browser is being run and a URL (Uniform Resource Lo- 
cator). The URL contains the name of and path to the 
web page file. Upon receiving the Get request, the serv- 
er retrieves the web page file according to the URL in 
the Get request and sends the web page file to the user 
computer (on which the browser is being run) identified 
by the IP address in the Get request. The server then 
records access status data for the web page file in a 
server log. Upon receiving the web page file, the web 
browser displays it as a web page. If the web page file 
also contains links to other type files, the browser further 
sends Get requests to the server, so that these files can 
be obtained and displayed together with the web page 
file. The links embedded in the web page file contain the 
names of and paths to these files. After sending these 
files to the browser, the server records access status 
data for these files in the server log. If the web page file 
further contains a link to another web page file, in re- 
sponse to clicking (activating) the link, the browser 
sends a Get request to the server, so that the web page 
file can be obtained and a new web page can be dis- 
played. This link contains the name of and path to the 
web page file. After sending this web page file to the 
user computer (on which the browser is being run), the 
server records access status data for the web page file 
in the server log. 

[0029] It should be noted that in figure 1 browsers 
(108.-,, 108, 2 , IO8.3, and 108, 4 ) interact with servers 
(102.-,, 102. 2 , 102. 3 , and 102. 4 ) in a stateless fashion. 
The web browsers (108.-, : 108. 2 , 108. 3 , and 108. 4 ) send 
requests to servers (102.,, 102. 2 , 102. 3 , and 102. 4 ) in a 



6 

random manner. While a browser (108.-,, 108. 2 , 108. 3 , 
or 108. 4 ) is sending a sequence of requests to a server 
(102.-,, 102. 2 , 102. 3 , or 102.4), jt does not maintain a 
constant connection to that server between any two con- 

5 secutive requests. To a server, it has no control over the 
sequences of requests; a subsequent request may not 
have any logical relationship with the previous one; a 
sequence of requests may come from different web 
browsers; a request may be generated from a link em- 

10 bedded in a web page file. Consequently, the Internet 
data are kept as a set of separate and non -correlated 
data records that are chronologically generated accord- 
ing to the times at which the requests were received and 
processed. Thus, the Internet data stored in the four sets 

15 of server logs (104.-,, 104. 2 , 104. 3 , and 104. 4 ), without 
further processing, are not meaningful to management 
and business operation. 

[0030] As shown in figure 1 , data processing compu- 
ter 108 contains a utility application 112, a page map 

20 114, and a loading utility 115. Via Intranet 107, utility ap- 
plication 1 1 2 is able to get access to the four sets of serv- 
er logs (104.-,, 104. 2> 104.3, and 104. 4 ) ; to collect data 
from them, to process the data collected, and to store 
the processed data in page map 1 1 4. Loading utility 1 1 5 

25 is able to load the data from page map 1 1 4 to data ware- 
house 116 for further processing. 
[0031] Referring to figure 2, there is shown a portion 
of a web page 200, which is associated with a web page 
file (HTML, SHTML, DHTML, or CGI file) 201 . 

30 [0032] As shown in figure 2, the portion of web page 
200 contains six regions, including: a text region 202; a 
graphic region 204, which is associates with a link 205 
to a GIF file; a graphic region 206, which is associated 
with a link 207 to a JPEG file; a multimedia region 208, 

35 which is associated with a link 209 to an AVI file; a region 
214, which is associated with link 215 to other portions 
of web page 200; and a region 216, which is associated 
with a link 21 7 to another web page file. Links 205, 207, 
209, 21 5 and 21 7 are embedded in web page file 201 . 

40 [0033] Referring to figure 3, there is shown a plurality 
of exemplary data records in server logs (104..,, 104. 2 , 
104. 3 , or 104. 4 ) in some detail. 

[0034] As shown in figure 3, four records J 1 . 4 indicate 
the access status for web page file 201 and the other 

45 type files ( GIF, JPEG and AVI files) that are linked in 
web page file 201 . To better describe the process of gen- 
erating the four records (J-,_ 4 ), ft is assumed that: (1)web 
page file 201 is stored in page repository 102..,, (2) web 
page file 201 has been accessed by browser 1 08.., , (3) 

50 server 102.., generates records J 1w4 in server logs 104..,, 
and (4) the four browsers (102.-,, 102. 2 , 102. 3 , and 
102. 4 ) are all sending Get requests to server 102.-,. 
[0035] To obtain web page file 201, browser 108.-, 
sends a Get request to server 102.., via Internet 105. 

55 The Get request contains the IP address assigned to 
user computer 106.-, and an URL indicating the name 
of and path to web page file 201 . Upon receiving the Get 
request, server 102.., retrieves web page file 201 from 
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web page repository 104.., and sends it, via Internet 105, 
to user computer 1 06.., according to the I P address con- 
tained in the Get request. In the meantime, server 1 02.., 
stores information indicating access status tor web page 
file 201 into record J-, . Since links 205, 207, and 209 are 
embedded in web page file 201 to link GIF, JPEG and 
AVI files respectively, web browser 108.., further sends 
three Get requests to server 102..,. Links 205, 207 and 
209 contains the file names of and paths to GIF, JPEG, 
and AVI files, respectively. In addition to containing the 
IP address assigned to user computer 106.1 , the three 
Get requests contain the file names of and paths to the 
GIF, JPEG, and AVI files, respectively. Upon receiving 
the three Get requests, server 102.., retrieves the GIF, 
JPEG and AVI files from web page repository 104 t and 
sends them, via Internet 105, to user computer 106.1 
according to the IP address contained in the Get re- 
quest. In the meantime, server 102.., stores information 
indicating access status for the GIF, JPEG, and AVI files 
into records J 2 , J 3 , and J 4 , respectively. As shown in fig- 
ure 2, data records J v4 are scattered among the other 
records in the server logs 104.-,, because the four 
browsers (102..,, 102. 2 , 102. 3 , and 102. 4 ) are all sending 
Get requests to server 1 02.., : and data records in server 
logs 104.-| are chronologically generated according to 
the times when Get requests have been received and 
processed by server 102..,. It should be noted that, even 
though figure 3 depicts a process of generating access 
status information for web page file 21 0 having a partic- 
ular web page layout, the principle of figure 3 applies to 
any web page files having any web page layouts. 
[0036] Typically, each of the records in server logs 
(104..,, 104. 2 , 104. 3 , and 104. 4 ) contains the following 
fields: 

IP address assigned to the user computer, 
user's domain name, 
name ofthe request (such as Get), 
time stamp on which the request was received, 
URL (including access path to the file and parame- 
ters passed), 
server name, 
IP address of the server, 
bytes received from the browser, 
bytes sent to the browser, and 
status code indicating operational status of 
processing the request. 

[0037] Referring to f igu re 4, there is shown a flowchart 
illustrating the operation of forming page map 114 by 
utility application 112 shown in figure 1, in accordance 
with the present invention. 

[0038] In step 402, utility application 112 collects In- 
ternet data stored in server logs (104. 1( 104. 2 , 104. 3 , 
and 104. 4 ) via Intranet 107. 

[0039] In step 404, utility application 112 identifies 
what types of servers that have generated the Internet 
data, because the four sets of server logs (104^ , 104. 2 , 



104. 3 , and 104. 4 ) can be generated by different types of 
servers. For example, the four servers (102..,, 102. 2t 
1 02. 3 , and 1 02. 4 ) shown in figure 1 can be a web server, 
hosting web server with virtual domains, commerce 

5 server, and proxy server, respectively. Since different 
types of servers may generate Internet data with differ- 
ent formats, the data format and content in one set of 
server logs (104.-,, 104. 2 , 104. 3 , or 104. 4 ) may be differ- 
ent from those in the other three sets of server logs. By 

10 identifying server type, utility application 112 can proc- 
ess the Internet data in a way that is suitable to the data 
format and content in the identified server logs. In doing 
so, utility application 112 can process and combine In- 
ternet data generated by different types of servers. In 

15 the present invention, the server type can be identified 
by the fields included and orders of the fields in the serv- 
er logs. 

[0040] In step 406, utility application 112 removes 
non-useful data from the data collected in step 402. By 

20 way of example, a backspace in a URL is non-useful 
character; one of the two 7/ n in a URL is a non-useful 
character because two 7/" have the same meaning as 
one 7" to a server. Thus, the backspace and one 7" can 
be removed. By way of another example, the data in a 

25 record for retrieving a file associated to a unrecogniza- 
ble URL is not useful, because no file can be found in 
response to the URL. Thus, the whole record can be 
removed. Typically, status code field in a data record in- 
dicates whether a request has been successfully proc- 

30 essedornot. This step is advantageous because server 
logs may contain a huge volume of data. Keeping non- 
useful data in applications, such as data warehouse ap- 
plications, not only is wasteful of storage space, it may 
also cause errors in the reports and during analysis. 

35 [0041] In step 408, utility application 112 identifies 
records that store data indicating file access status for 
web page files (HTML, STHML, DHTML, or CGI files). 
In the example shown in figure 3, record J.-, for web page 
file 201 shown in figure 2 will be identified in step 408. 

40 [0042] In step 410, utility application 112 identifies 
records that store data indicating file access status for 
other type files (such as GIF, JPEG and AVI files) that 
are linked into respective web page files. In the example 
shown figure 3, records J2-3 can be identified to be 

45 linked to web page file 201 shown in figure 2. 

[0043] In step 412, utility application 112 correlates 
the records for the identified other type files with their 
respective identified web page files by using the IP ad- 
dress (assigned to the user computer running the 

50 browser) and time stamp fields in the these records. As 
described above, if any other type files are linked into a 
web page file after a browser has received the web page 
file, the browser immediately sends requests out to re- 
trieve the other type files. Hence, the IP address in the 

55 request for retrieving the web page file is the same IP 
address in the requests for retrieving the other type files. 
Also the time at which the request for retrieving the web 
page file was received should be close to those at which 
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the requests for retrieving the other type files were re- 
ceived. Therefore, utility application 112 correlates the 
following records together: 

(1 ) a particular record for a particular web page file, 
, which contains an IP address and time stamp, and 

(2) a set of records for the other type files, which 
contain the same IP address with that in the partic- 
ular record; and contain the time stamps close to 
(within one or two seconds, for example) that in the 
particular record. 

In the example shown in figure 3, records J 2 _4 can be 
correlated with record . 

[0044] In step 41 4, for each ofthe web page files, util- 
ity application 1 1 2 calculates a length by combining the 
bytes sent for the one web page file with the bytes sent 
for the other type files linked in the one web page file. 
In the example shown in figure 2, the bytes sent for web 
page file 201 will be combined with the bytes sent for 
GIF. JPEG and AVI files. The length is useful for an In- 
ternet Service Provider to manage its operation, be- 
cause it can provide the information to determine the 
bandwidth used and the cost to send these files. 
[0045] In step 416, utility application 112 stores the 
data processed in the steps (406, 408, 410, 412, and 
414) in page map 114 shown in figure 1 . 
[0046] Referring to figure 5, there is shown a plurality 
of exemplary data records in page map 114, in accord- 
ance with the present invention. 

[0047] As shown in figure 5, page map 114 contains 

a plurality of data records 502.,, 502. 2 , , 502.}, 

Each of the records may include several physical or log- 
ical storage units. Each of the records stores the IP ad- 
dress used by a browser lo retrieve a web page file, and 
the correlated information indicating the access status 
for the web page file and other type files linked to the 
web page file. Each of the records also stores a com- 
bined length for all the bytes sent for the web page file 
and the other type files. 

[0048] Referring to figure 6, there is shown an exem- 
plary computer system 600 used as data processing 
computer to run utility application 112, in accordance 
with the preset invention. 

[0049] As shown in figure 6, computer system 600 
comprises a processing unit 602, a memory device 604, 
a hard disk 606, a disk drive interface 608, a display 
monitor 610, and display interface 612, a bus interface 
624, a mouse 625, a keyboard 626, a network commu- 
nication interface 634, and a system bus 614. 
[0050] Hard disk 606 is coupled to disk drive interface 
608, display monitor 61 0 is coupled to display interface 
61 2, and mouse 625 and keyboard 626 are coupled to 
bus interface 624. Coupled to system bus 614 are: 
processing unit 602, memory device 604, disk drive in- 
terface 608, display interface 612, bus interface 624, 
and network communication interface 634. 
[0051] Memory device 604 is able to store programs 



(including instructions and data). Operating together 
with disk drive interface 608, hard disk 606 is also able 
to store programs. However, memory device 604 has 
faster access speed than hard disk 606, while hard disk 

5 606 has higher capacity than memory device 604. 
[0052] Operating together with display interface 61 2, 
display monitor 610 is able to provide visual interface 
between programs being executed and a user. 
[0053] Operating together with bus interface 624, 

10 mouse 625 and keyboard 626 are able to provide inputs 
to computer system 600. 

[0054] Network communication interface 634 is able 
to provide an interface between computer system 600 
and Intranet 107. 
15 [0055] Processing unit 602, which may include one or 
more processors, has access to memory device 604 
and hard disk 606, and is able to control operations of 
the computer by executing programs stored in memory 
device 604 or hard disk 606. Processing unit 602 is also 
able to control the transmissions of programs and data 
between memory device 604 and hard disk 606. 
[0056] In the present invention, utility application 112 
can be stored in either memory device 604 or hard disk 
606, and be executed by processing unit 602. 
[0057] While the invention has been illustrated and 
described in detail in the drawing and foregoing descrip- 
tion, it should be understood that the invention may be 
implemented through alternative embodiments within 
the spirit of the present invention. Thus, the scope of the 
invention is not intended to be limited to the illustration 
and description in this specification, but is to be defined 
by the appended claims. 



35 Claims 

1 . For use with a server that contains a plurality of web 
pages and logs, each of the web pages containing 
a web page file and one or more other type files, 

40 each of the logs containing data for indicating ac- 
cess status web page files and the other type files, 
a method comprising the steps of: 

receiving data from the server; 
45 identifying data for a web page file; 

identifying data for other type files that are 
linked in said web page file; and 
correlating said data for said other type files 
with said data for said web page file. 

50 

2. The method of claim 1 , said web page file, or each 
of said other type files linked in said web page file, 
having a file length, and the method further com- 
prising the step of: 

55 combining file lengths of said web page file 

and said other type files that are linked in said web 
page file. 
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3. The method of claim 1 or claim 2, further comprising 
the step of: 

filtering non-usable character(s) from said da- 
ta for indicating access status of said web page file 
and said data for indicating access status of said 
other type files. 

4. For use with a server that contains a plurality of web 
pages and logs, each ofthe web pages containing 
a web page file and one or more other type files, 
each of the logs containing data for indicating ac- 
cess status of web page files and other type files, a 
method comprising the steps of: 

receiving data from the server; 

identifying data for web page files that have 

been accessed; 

identifying data for other type files that are re- 
spectively linked in said accessed web page 
files; and 

correlating said data for said other files that are 
respectively linked in said accessed web page 
files into said data for said accessed web page 
files. 

5. The method of claim 4, further comprising the step 
of: 

filtering non-usable character(s) from said da- 
ta for indicating access status of said accessed web 
page files and said data for indicating access status 
of said accessed other type files. 

6. The method of claim 4 or claim 5, the web pages 
being accessed by different types of users, the 
method further comprising the step of: 

sorting said correlated data in accordance 
with the different types of users. 

7. The method of claim 4, claim 5 or claim 6, each one 
of said web page files, or each of said other type 
files having a file size, and the method further com- 
prising the step of: 

combining file sizes of said other type files 
linked in respective web page files. 

8. For use with a plurality of servers with each of them 
containing a plurality ofweb pages and logs, each 
of the web pages containing a web page file and 
one or more other type of files, each ofthe logs stor- 
ing data for indicating access status ofweb page 
files or other type files, a method comprising the 
steps of: 



identifying formation for other type files that are 
linked in said respective accessed web page 
files stored in said one server; and 
correlating said data for said other type files 
$ that are linked in said respective accessed web 

page files with said data for said accessed web 
page files stored in said one server. 

9. The method of claim 8, each of said web page files 
10 and each of said other type files having a file size, 
and the method further comprising the step of: 

combining file sizes of said files linked in said 
respective accessed web page. 

15 10. The method of claim claim 8 or claim 9, further com- 
prising the step of: 

filtering non-usable character(s) from said 
identified data. 

20 11. The method of claim 8, claim 9 or claim 10 the ac- 
cessed web page files and other type files being ac- 
cessed by different regions : the method further 
comprising the step of: 

sorting said correlated data in accordance 

25 with the different regions. 

12. The method of any one of the preceding claims, 

wherein said web page file is an HTML file, a 
30 DHTML file, an SHTML file, or a CGI file; and 

wherein said other type files are GIF files, JPEG 
tiles, or AVI tiles. 
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13. The method of any one ofthe preceding claims, fur- 
ther comprising the step of: 

storing said correlated data into an output file. 



40 
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50 



receiving data from any one of the servers; 
identifying a server type for said one server; 
identifying data for the web page files that are 
stored in said one server and have been ac- 
cessed: 
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